library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) ## NEW PACKAGE
library(tidybayes) ## NEW PACKAGE
At this point in the term, we’ll be deviating in our code from
McElreath. His course is taught entirely using rethinking,
which is a pedogigical tool. It has clear mapping between mathematical
models and syntax. But it lacks flexibility and has fewer modeling
options.
On the other hand, there is a package called brms that
also does Bayesian modeling. This package uses syntax simliar to
lme4 (if you’ve used that), supports a wider range of
distributions, integrates with the tidyverse ecosystem (if
you’ve used that), has more extensive documentation, is more actively
maintained, is more widely used (i.e., more support), and is more
suitable for complex models.
You’re welcome to use the rethinking package when it
suits you, in this course and in your research, but my goal is to
introduce you to the brms package. Instead of reviewing the
code from McElreath’s lecture today, we’ll be revisiting some familiar
models using brms.
Let’s return to the height and weight data.
data(Howell1, package = "rethinking")
d <- Howell1
library(measurements)
d$height <- conv_unit(d$height, from = "cm", to = "feet")
d$weight <- conv_unit(d$weight, from = "kg", to = "lbs")
describe(d, fast = T)
## vars n mean sd median min max range skew kurtosis se
## height 1 544 4.54 0.91 4.88 1.77 5.88 4.1 -1.26 0.58 0.04
## weight 2 544 78.51 32.45 88.31 9.37 138.87 129.5 -0.54 -0.94 1.39
## age 3 544 29.34 20.75 27.00 0.00 88.00 88.0 0.49 -0.56 0.89
## male 4 544 0.47 0.50 0.00 0.00 1.00 1.0 0.11 -1.99 0.02
d <- d[d$age >= 18, ]
d$height_c <- d$height - mean(d$height)
\[\begin{align*} w_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta (h_i - \bar{h}) \\ \alpha &\sim \text{Normal}(130, 20) \\ \beta &\sim \text{Normal}(0, 25) \\ \sigma &\sim \text{Uniform}(0, 25) \\ \end{align*}\]
m42.1 <-brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000, chains = 4,
seed = 3,
file = here("files/data/generated_data/m42.1"))
brm() is the core function for fitting Bayesian models
using brms.
m42.1 <-brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000, chains = 4,
seed = 3,
file = here("files/data/generated_data/m42.1"))
family specifies the distribution of the outcome family.
In many examples, we’ll use a gaussian (normal) distribution. But there
are many many
many options for this.
m42.1 <-brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000, chains = 4,
seed = 3,
file = here("files/data/generated_data/m42.1"))
The formula argument is what you would expect from the
lm() and lmer() functions you have seen in the
past. The benefit of brms is that this formula can easily
handle complex and non-linear terms. We’ll be playing with more in
future classes.
m42.1 <-brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000, chains = 4,
seed = 3,
file = here("files/data/generated_data/m42.1"))
Here we set our priors. Class b refers to
population-level slope parameters (sometimes called fixed effects).
Again, this argument has the ability to become very detailed, specific,
and flexible, and we’ll play more with this.
m42.1 <-brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000, chains = 4,
seed = 3,
file = here("files/data/generated_data/m42.1"))
Hamiltonian MCMC runs for a set number of iterations, throws away the first bit (the warmup), and does that up multiple times (the number of chains).
m42.1 <-brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000, chains = 4,
seed = 3,
file = here("files/data/generated_data/m42.1"))
Remember, these are random walks through parameter space, so set a seed for reproducbility. Also, these can take a while to run, especially when you are developing more complex models. If you specify a file, the output of the model will automatically be saved. Even better, then next time you run this code, R will check for that file and load it into your workspace instead of re-running the model. (Just be sure to delete the model file if you make changes to any other part of the code.)
summary(m42.1)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: weight ~ 1 + height_c
## Data: d (Number of observations: 352)
## Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup draws = 16000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 99.21 0.50 98.23 100.18 1.00 14807 12164
## height_c 42.05 1.95 38.22 45.91 1.00 15921 12552
##
## Further Distributional Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 9.38 0.36 8.72 10.12 1.00 15626 12366
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(m42.1)
Before we start to interpret our model, we should evaluate whether it does a good job. Posterior predictive checks plot the implied distribution of your outcome next to your actual distribution. If your posterior predictive values are off, your model is off.
pp_check(m42.1)
## Using 10 posterior draws for ppc type 'dens_overlay' by default.
We can also look at the posterior distributions of our chains. Remember, they should be covering most of the same space, so these distributions should pretty much overlap.
mcmc_plot(m42.1, type = "dens_overlay")
Let’s sample from the posterior. First, get_variables()
will tell us everything at our disposal.
get_variables(m42.1)
## [1] "b_Intercept" "b_height_c" "sigma" "Intercept"
## [5] "lprior" "lp__" "accept_stat__" "stepsize__"
## [9] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
Let’s focus on just the parameters we’ve estimated. In prior
lectures, we’ve drawn samples from the posterior distribution to
generate plots and provide summaries. We can use the
spread_draws() function to do so.
p42.1 <- m42.1 %>%
spread_draws(b_Intercept, b_height_c, sigma,
ndraws = 1e4, seed = 123)
dim(p42.1)
## [1] 10000 6
head(p42.1)
## # A tibble: 6 × 6
## .chain .iteration .draw b_Intercept b_height_c sigma
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 1 2463 2463 99.6 41.6 9.59
## 2 1 2511 2511 99.7 41.4 9.32
## 3 3 2419 10419 99.0 42.8 8.85
## 4 3 718 8718 99.4 39.9 8.74
## 5 4 483 12483 99.6 41.9 9.17
## 6 1 2986 2986 100. 43.0 9.46
lp__ is the log posteriorlprior is the log of the priorp42.1 %>%
ggplot(aes(x = b_Intercept)) +
geom_density(fill = "#1c5253", color = "white") +
labs(
title = "Posterior probability",
x = "probabilty of intercept (mean weight)"
) +
scale_y_continuous(NULL, breaks = NULL)
Finally, we might want to plot the bivariate distributions of our parameters.
pairs(m42.1)
If we were encountering this problem for the first time, we would want to work on on our priors. These ones are pretty bad. We have a few tools available to help us define and test our priors.
First, let’s view the available priors for our model:
get_prior(
formula = weight ~ 1 + height_c,
data = d
)
## prior class coef group resp dpar nlpar lb ub
## (flat) b
## (flat) b height_c
## student_t(3, 98.7, 14.8) Intercept
## student_t(3, 0, 14.8) sigma 0
## source
## default
## (vectorized)
## default
## default
If you’re ever not sure what coefficients to put priors on, this function can help with that.
Let’s refit our model with our earlier priors. Before we fit this to data, we’ll start by only sampling from our priors.
m42.1p <- brm(
data = d,
family = gaussian,
weight ~ 1 + height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( normal(0,25), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000,
seed = 3,
sample_prior = "only")
## Compiling Stan program...
## Start sampling
The output of spread_draws will now draw from samples
from the prior, not samples from the posterior.
p42.1p <- m42.1p %>%
spread_draws(b_Intercept, b_height_c, sigma)
head(p42.1p)
## # A tibble: 6 × 6
## .chain .iteration .draw b_Intercept b_height_c sigma
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 1 1 1 144. 17.0 31.2
## 2 1 2 2 116. -22.1 18.4
## 3 1 3 3 139. -37.5 32.0
## 4 1 4 4 158. -23.6 10.9
## 5 1 5 5 102. 20.3 35.9
## 6 1 6 6 139. 13.7 32.7
We’ll plot the regression lines from the priors against the real data, to see if they make sense.
labels = seq(4, 6, by = .5)
breaks = labels - mean(d$height)
d %>%
ggplot(aes(x = height_c, y = weight)) +
geom_blank()+
geom_abline(aes( intercept=b_Intercept, slope=b_height_c),
data = p42.1p[1:50, ], #first 50 draws only
color = "#1c5253",
alpha = .3) +
scale_x_continuous("height(feet)", breaks = breaks, labels = labels) +
scale_y_continuous("weight(lbs)", limits = c(50,150))
Let’s see if we can improve upon this model. One thing we know for sure is that the relationship between height and weight is positive. We may not know the exact magnitude, but we can use a distribution that doesn’t go below zero. We’ve already discussed uniform distributions, but those are pretty uninformative – they won’t do a good job regularizing – and we can also run into trouble if our bounds are not inclusive enough.
The log-normal distribution would be a good option here.
set.seed(4)
tibble(b = rlnorm(1e4, mean = 0, sd = 1)) %>%
ggplot(aes(x = b)) +
geom_density(fill = "grey92") +
coord_cartesian(xlim = c(0, 5)) +
labs(title = "Log-Normal(0,1)")
The log-normal is the distribution whose logarithm is normally distributed.
set.seed(4)
tibble(rnorm = rnorm(1e5, mean = 0, sd = 1),
`log(rlognorm)` = log(rlnorm(1e5, mean = 0, sd = 1))) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value)) +
geom_density(fill = "grey92") +
coord_cartesian(xlim = c(-3, 3)) +
facet_wrap(~ name, nrow = 2)
Let’s try this new prior. Play around with the plot code to find parameters that you think are reasonable. I’m going to use 1,2.
m42.2p <- brm(
data = d,
family = gaussian,
weight ~ height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( lognormal(1,2), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000,
seed = 3,
sample_prior = "only")
## Warning: It appears as if you have specified a lower bounded prior on a parameter that has no natural lower bound.
## If this is really what you want, please specify argument 'lb' of 'set_prior' appropriately.
## Warning occurred for prior
## b ~ lognormal(1, 2)
## Compiling Stan program...
## Start sampling
## Warning: There were 14061 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 81 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.11, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
p42.2p <- m42.2p %>%
spread_draws(b_Intercept, b_height_c, sigma)
d %>%
ggplot(aes(x = height_c, y = weight)) +
geom_blank()+
geom_abline(aes( intercept=b_Intercept, slope=b_height_c),
data = p42.2p[1:50, ], #first 50 draws only
color = "#1c5253",
alpha = .3) +
scale_x_continuous("height(feet)", breaks = breaks, labels = labels) +
scale_y_continuous("weight(lbs)", limits = c(50,150))
Applied to our dataset:
m42.2 <- brm(
data = d,
family = gaussian,
weight ~ height_c,
prior = c( prior( normal(130,20), class = Intercept),
prior( lognormal(1,2), class = b),
prior( uniform(0,50), class = sigma, ub = 50)
),
iter = 5000, warmup = 1000,
seed = 3,
file = here("files/data/generated_data/m42.2"))
summary(m42.2)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: weight ~ height_c
## Data: d (Number of observations: 352)
## Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
## total post-warmup draws = 16000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 99.20 0.50 98.22 100.18 1.00 13657 11564
## height_c 42.16 2.01 38.26 46.07 1.00 17384 12434
##
## Further Distributional Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 9.39 0.36 8.72 10.13 1.00 15405 11446
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s return to the tidybayes functions for summaries.
As a reminder, we already saw spread_draws()
post_draws = m42.2 %>%
spread_draws(b_Intercept, b_height_c, sigma) %>%
sample_n(50)
m_height <- mean(d$height)
d %>%
ggplot(aes(x = height, y = weight)) +
geom_point(alpha = .5) +
geom_abline(aes(intercept = b_Intercept - b_height_c*m_height, #to account for centering
slope = b_height_c),
alpha = .3,
color = "#1c5253",
data = post_draws)
Let’s practice what we’ve learned using a different dataset. We’ll
use the built-in msleep dataset from the
ggplot2 package, which contains sleep data for various
mammals.
data("msleep")
d_sleep <- msleep %>%
drop_na(sleep_total, bodywt) %>%
mutate(log_weight = log(bodywt))
# Quick look at the data
describe(d_sleep[c("sleep_total", "bodywt", "log_weight")], fast = T)
## vars n mean sd median min max range skew kurtosis se
## sleep_total 1 83 10.43 4.45 10.10 1.9 19.9 18.0 0.05 -0.71 0.49
## bodywt 2 83 166.14 786.84 1.67 0.0 6654.0 6654.0 7.10 53.72 86.37
## log_weight 3 83 0.84 3.26 0.51 -5.3 8.8 14.1 0.30 -0.76 0.36
Our goal is to model the relationship between body weight and total sleep duration. Because body weight is highly skewed, we’ll use the log-transformed weight.
Let’s set up a model to predict sleep duration from log body weight:
\[\begin{align*} \text{sleep}_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta (\text{log_weight}_i) \\ \alpha &\sim \text{Normal}(12, 4) \\ \beta &\sim \text{Normal}(0, 2) \\ \sigma &\sim \text{Uniform}(0, 10) \\ \end{align*}\]
brm() that samples only from the
priors (sample_prior = "only")m_sleep_prior <- brm(
data = d_sleep,
family = gaussian,
sleep_total ~ log_weight,
prior = c(
prior(normal(12, 4), class = Intercept),
prior(normal(0, 2), class = b),
prior(uniform(0, 10), class = sigma)
),
sample_prior = "only",
seed = 123
)
## Warning: It appears as if you have specified an upper bounded prior on a parameter that has no natural upper bound.
## If this is really what you want, please specify argument 'ub' of 'set_prior' appropriately.
## Warning occurred for prior
## <lower=0> sigma ~ uniform(0, 10)
## Compiling Stan program...
## Start sampling
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.009 seconds (Warm-up)
## Chain 1: 0.011 seconds (Sampling)
## Chain 1: 0.02 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 2e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.012 seconds (Warm-up)
## Chain 2: 0.005 seconds (Sampling)
## Chain 2: 0.017 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 1e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.01 seconds (Warm-up)
## Chain 3: 0.022 seconds (Sampling)
## Chain 3: 0.032 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 1e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.011 seconds (Warm-up)
## Chain 4: 0.008 seconds (Sampling)
## Chain 4: 0.019 seconds (Total)
## Chain 4:
## Warning: There were 3232 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
p_sleep_p <- m_sleep_prior %>%
spread_draws(b_Intercept, b_log_weight, sigma)
d_sleep %>%
ggplot(aes(x = log_weight, y = sleep_total)) +
geom_blank()+
geom_abline(aes( intercept=b_Intercept, slope=b_log_weight),
data = p_sleep_p[1:50, ], #first 50 draws only
color = "#1c5253",
alpha = .3) +
labs(
x = "weight (log)",
y = "sleep"
)
Once you’re happy with your priors, fit the actual model to the data. Then:
m_sleep_fit <- brm(
data = d_sleep,
family = gaussian,
sleep_total ~ log_weight,
prior = c(
prior(normal(12, 4), class = Intercept),
prior(normal(0, 2), class = b),
prior(uniform(0, 10), class = sigma)
),
seed = 123,
file = here("files/data/generated_data/m2.sleep")
)
## Warning: It appears as if you have specified an upper bounded prior on a parameter that has no natural upper bound.
## If this is really what you want, please specify argument 'ub' of 'set_prior' appropriately.
## Warning occurred for prior
## <lower=0> sigma ~ uniform(0, 10)
## Compiling Stan program...
## Start sampling
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 2.1e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.011 seconds (Warm-up)
## Chain 1: 0.007 seconds (Sampling)
## Chain 1: 0.018 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 3e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.014 seconds (Warm-up)
## Chain 2: 0.006 seconds (Sampling)
## Chain 2: 0.02 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 1e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.016 seconds (Warm-up)
## Chain 3: 0.007 seconds (Sampling)
## Chain 3: 0.023 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 1e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.011 seconds (Warm-up)
## Chain 4: 0.007 seconds (Sampling)
## Chain 4: 0.018 seconds (Total)
## Chain 4:
summary(m_sleep_fit)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: sleep_total ~ log_weight
## Data: d_sleep (Number of observations: 83)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 11.10 0.42 10.28 11.92 1.00 4051 2987
## log_weight -0.78 0.13 -1.02 -0.53 1.00 3865 2709
##
## Further Distributional Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 3.74 0.30 3.23 4.37 1.00 3676 2744
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(m_sleep_fit)
## Using 10 posterior draws for ppc type 'dens_overlay' by default.
p_sleep <- m_sleep_fit %>%
spread_draws(b_Intercept, b_log_weight, sigma)
d_sleep %>%
ggplot(aes(x = log_weight, y = sleep_total)) +
geom_point()+
geom_abline(aes( intercept=b_Intercept, slope=b_log_weight),
data = p_sleep_p[1:50, ], #first 50 draws only
color = "#1c5253",
alpha = .3) +
labs(
x = "weight (log)",
y = "sleep"
)
We also have gather_draws():
m42.2 %>%
gather_draws(b_Intercept, b_height_c, sigma) %>%
sample_n(2)
## # A tibble: 6 × 5
## # Groups: .variable [3]
## .chain .iteration .draw .variable .value
## <int> <int> <int> <chr> <dbl>
## 1 2 3366 7366 b_Intercept 99.0
## 2 3 270 8270 b_Intercept 99.2
## 3 3 3707 11707 b_height_c 43.6
## 4 1 2361 2361 b_height_c 42.7
## 5 3 3845 11845 sigma 9.78
## 6 2 48 4048 sigma 9.24
How is this different from spread_draws()?
gather_draws() is a useful function if we’re thinking
about summarizing the results of our models.
m42.2 %>%
gather_draws(b_Intercept, b_height_c, sigma) %>%
median_qi()
## # A tibble: 3 × 7
## .variable .value .lower .upper .width .point .interval
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 b_height_c 42.1 38.3 46.1 0.95 median qi
## 2 b_Intercept 99.2 98.2 100. 0.95 median qi
## 3 sigma 9.38 8.72 10.1 0.95 median qi
m42.2 %>%
gather_draws(b_Intercept, b_height_c, sigma) %>%
ggplot(aes(x = .value, y=.variable)) +
stat_halfeye()
We can make two kinds of predictions based on our model. First, we
can get a posterior predictive distribution using
add_predicted_draws():
# simulate new data
height_c = sample(d$height_c, size = 1e2, replace = T)
# get predictions
predictions = data.frame(height_c) %>% add_predicted_draws(m42.2, seed = 1)
dim(predictions)
## [1] 1600000 6
head(predictions)
## # A tibble: 6 × 6
## # Groups: height_c, .row [1]
## height_c .row .chain .iteration .draw .prediction
## <dbl> <int> <int> <int> <int> <dbl>
## 1 0.365 1 NA NA 1 109.
## 2 0.365 1 NA NA 2 116.
## 3 0.365 1 NA NA 3 108.
## 4 0.365 1 NA NA 4 130.
## 5 0.365 1 NA NA 5 119.
## 6 0.365 1 NA NA 6 108.
Or, we can get expected values using
add_epred_draws():
# get expected values
expected_vals = data.frame(height_c) %>% add_epred_draws(m42.2, seed = 1)
dim(expected_vals)
## [1] 1600000 6
head(expected_vals)
## # A tibble: 6 × 6
## # Groups: height_c, .row [1]
## height_c .row .chain .iteration .draw .epred
## <dbl> <int> <int> <int> <int> <dbl>
## 1 0.365 1 NA NA 1 115.
## 2 0.365 1 NA NA 2 114.
## 3 0.365 1 NA NA 3 115.
## 4 0.365 1 NA NA 4 114.
## 5 0.365 1 NA NA 5 116.
## 6 0.365 1 NA NA 6 116.
predictions %>% full_join(expected_vals) %>%
pivot_longer(c(.prediction, .epred)) %>%
ggplot(aes(x=value, group = name)) +
geom_density(aes(fill=name), alpha=.5)
## Joining with `by = join_by(height_c, .row, .chain, .iteration, .draw)`
predictions %>% full_join(expected_vals) %>%
pivot_longer(c(.prediction, .epred)) %>%
sample_n(size = 200) %>%
mutate(height = height_c + m_height) %>%
ggplot(aes(x=height, y=value, group = name)) +
geom_point(alpha = .3) +
facet_wrap(~name)
## Joining with `by = join_by(height_c, .row, .chain, .iteration, .draw)`
Using the tools we learned (spread_draws(),
gather_draws(), etc.):
p_sleep %>%
gather_draws(b_Intercept, b_log_weight, sigma) %>%
median_qi
## # A tibble: 3 × 7
## .variable .value .lower .upper .width .point .interval
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 b_Intercept 11.1 10.3 11.9 0.95 median qi
## 2 b_log_weight -0.776 -1.02 -0.527 0.95 median qi
## 3 sigma 3.71 3.23 4.37 0.95 median qi
log_weight = sample(d_sleep$log_weight, replace = T, size = 10)
predictions = data.frame(log_weight) %>% add_predicted_draws(m_sleep_fit, seed = 1)
head(predictions)
## # A tibble: 6 × 6
## # Groups: log_weight, .row [1]
## log_weight .row .chain .iteration .draw .prediction
## <dbl> <int> <int> <int> <int> <dbl>
## 1 -3.58 1 NA NA 1 11.9
## 2 -3.58 1 NA NA 2 14.8
## 3 -3.58 1 NA NA 3 11.9
## 4 -3.58 1 NA NA 4 18.9
## 5 -3.58 1 NA NA 5 15.6
## 6 -3.58 1 NA NA 6 11.4
predictions %>%
ggplot(aes(x = .prediction)) +
geom_density(aes(x = sleep_total), data = msleep) +
geom_histogram(aes(y = ..density..), fill = "#1c5253", color = "white", alpha = .3)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
If you want to get the pareto-smoothed importance sampling:
loo1 <- loo(m42.2, save_psis = T)
loo1
##
## Computed from 16000 by 352 log-likelihood matrix.
##
## Estimate SE
## elpd_loo -1288.5 14.0
## p_loo 3.2 0.4
## looic 2577.1 27.9
## ------
## MCSE of elpd_loo is 0.0.
## MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.1]).
##
## All Pareto k estimates are good (k < 0.7).
## See help('pareto-k-diagnostic') for details.
And for the widely applicable information criteria:
waic(m42.2)
##
## Computed from 16000 by 352 log-likelihood matrix.
##
## Estimate SE
## elpd_waic -1288.5 14.0
## p_waic 3.2 0.4
## waic 2577.1 27.9
Remember, these are primarily used to compare multiple models. See
the loo package for more functions to help you compare
models and identify influential data points.
Try adding a new predictor to your model (e.g., vore -
what type of food the animal eats). How does this change your
predictions?